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ABSTRACT 

Four conputer programs using the general purpose friultiple linear 
regression program have been developed. Sotwi<; o ronrossion onaVysis is a 
stepwise procedure for sets of variables; there v/i'll be as mjry steps as 
there are sets. COVARMt T allows a solution to the analysis of covcn'cnce 
design with multipfc covariates. A third program has throe solutions to 
the two-way disproportionate analysis of variance: (a) the method of 
fitting constants^ (b) the hierarchical model and (c) the unadjusted main 
effects solution. The fourth program yields three solutions to the tv/o* 
way analysis of covariance, with or \;ithout proportionality, and with 
multiple covariates. The three solutions are similar to those described 
for a two-way analysis of variance with disproportionate cell frequencies. 

Four different speciolized prograiits have been developed from the 

utilization of a general purpose multiple linear regression program. The 

program*; that have been developed by these authors are described^ together 

with an indication of the program availability and a description of the 

statistical technique. 

Setwi se Regression Analysis 

Setwise regression analysis is a technique which was developed (Williams 

CO Lindem, 1971d) to allow a stepwise^ solution when the interest is in sets of 

JL.0 variables rather than in single variables. Thus, the setwise regression 

procedure bears a strong resemblance to the stepwise reg*/ession analysis, and 

a disadvantage of the stepwise procedure is overcome. 

The usual stepwise procedure becomes inappropriate when there are more 

than two categories being binary codod, A simple example can be mad{? with 

religious affiliation. Four categories might be used: Catholic, Protestant, 

Jewish, and other. Three binary predictors can be made with the first three 
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religious affiliations, and the fourth category can be represn^nted as not 

having irembership in the first throe categories. If religious affiliation 

were used in conjunction with other information, the stepwise procedure v/ould 

not yield a valid indication of the importance of the set of religious 

variables. The setwise procedure, on the other nand, would allow a direct 

approach to such a situation. 

The setwise procedure drops one set of variables at a time in a stepwise 

fashion. There will be as many steps as there are sets. The solution is accom- 

2 

plished by an iterative procedure that allows the R (multiple correlation 
coefficient squared) term to be maximized at each step in a backward stepwise 
manner* Once a set is discarded, the set is no longer considered at later 
steps. One set is discarded at each step, until there is only one set remaininj. 

As a recent issue of VIEWPOINTS has included a complete solution to a 
setwise problem (Williams, 1973), an example is omitted here. The documen- 
tation for the setwise program is given in Williams and Lindem (1971b). 
Analysis of Covariance with Multiple Covarlates (COVARMLT) 

Analysis of covariance programs are typically available, but many of 
these programs severely limit the number of covariatcs, usually to one or two 
covariates. This limitation is whoV/y unnecessary. The analysis of covariance 
can be conceptualized as being completed through the use of two linear models^, 
and a multiple linear regression solution follows In a straight-forward manner. 

. It is helpful to look ai the process of the analysis of covariance as 
It can be generated through the use of linear models. Before the linear 
models are developed it is useful to set forth a concrete example. Suppose 
15 students are split into three groups of five students each and are assigned 
to three different methods of learning beginning typewriting. Prior to 
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beginning the iiistructional period* the students are given an inteUigence 
test and a test of manual dexterity. After tho conclusion of the experiment 
a timed typing test is given. Table 1 contains the infonnotion for this 
analysis. 

TABLE } 

ANALYSIS or CCVARLAMCE WITH TWO COVARIATES 



Post-Test 


Intelligence Score 


Manual 


Group 1 ■ 1 


r,roup 2 « 1 






Dexterity 


0 otherwise 


0 otherwise 


3S 


1?C 


38 




0 


27 


98 


28 




0 


32 


102 


32 




0 


29 


106 


22 




0 


27 


94 


30 




0 


38 


123 


43 


0 




25 


96 


31 • 


0 




36 


108 


46 


0 




35 


115 


40 


0 




31 


;28 


35 


0 




27 


90 


27 


0 


0 


35 


110 


31 


0 


0 


19 


94 


25 


0 


0 


17 


95 


24 


0 


0 


32 


116 


33 


0 . 


Q 



Table 1 is constructed so that it might be easily transferred to IBM 
cards for a solution throuqh the use of multiple regression. The group 
Identifiers are binary coded and are found iu columns 4 and 5. The group 1 
Identifier is given by a 1 in column 4, and the group identifier for group 2 
is given by a 1 in column 5. A member of 3roup 3 can be identified by having 
a 0 in both columns 4 and 5. (If there are k groups, then there will be 
k-1 binary predictors for the group identifiers.) 

To accomplish an analysis of coveriance by regression it is first 
necessary to construct a full mod?] , A full model is essentially a model 
that contains all the information relevant to a data analysis. The full 
model for the present situation is: 



Y»b*bX4bX*bX+bX+e. (1) 
0 11 2 2 3 3 4 4 1 
where ^ * 

Y ■ the post* test score, 

• the intelligence test score* 

X the manual dexterity score, 
2 

X ■ 1 If the score is from a member of group 1, 0 otherMlse, 
3 

X » 1 If the score is from a member of group 2, C otherwise, 
4 

" the Y-intercept, 

b • b » the regression coefficients for X - X , and 
14 i 4 

■ the error in prediction v;1th the full model. 

If this model is solved using a multiple linear regression routine, part 
of the output will include the multiple correlation coefficient (R). For the 
present usage, since a full model is being used, the R value found from the use 
of equation 1 can be labeled R . 

A restricted nK)del can be developed using only the covarlates as 

predictor variables: 

Y»b4bX+bX+e, (2) 
0 112 2 2 

where 

Y ■ the post-test score, 

» the Intelliger^ce test score, 

X ■ t.r manual dexterity score, 
2 

b the Y-interccpt (this b^ value will. In qcneral , be 'different than 
0 0 

the b value from equation 1), 
0 

b - b » the regression coefficients for X and X (these regression coefficients 
12 12 

win, in general, be different from the b and h values in equation 1), 

1 2 

and 

e ■ the error in prediction v/ith the restricted model. 
2 



The restricted model also yields an R value, and it can be labeled R . 

RH 

The F test for the analysis of covariance i% given by: 
• (3) 

where 

k is the number of groups, 

N Is the number of subjects, and 

C Is the number of covariatcs. 

2 

Using the full rodel, an value of .88021 is found. Then, R ^ .77478. 

2 FM 

For the restricted model, R^^ - .83961, so that R ^.u * •70495. 

RM RM 

Using equation 3, 



F « (,7747 



7747R - .70/;9 G?/2 « 1.55. 
774787/Tri>-3-2) 



This F value can be interpreted in che usual way with degrees of freedom 
equal to 2 and 10. 
Finding the Adjusted Moans 

For two covariates the adjusted mean Can be found for each group using 
equation 4: 



where 



Y(adj) « the adjusted criterion mean of the k^^ groups 

«r th 

» the criterion mean of the k group, 

« the regression coefficient for the first covariate in the full model, 

X » the ovc^^all m • ' o.i the first covariate, 
Ik 

b « the regression coefficient for the second covariate in the full iitodcl, 

th 

Y « the mean of the k group on the second covariate, and 
2k 

X^. * the overall mean of the second covariate. 
2T 
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Additional covarlates can be added with no difficulty In an d -logous manner. 

For the present datd» Y • 30» 7 = 33. T ■ 26. X * 104. X « 114, 

1^2 3 11 12 

y„ ■ 101, I ^ « 106.33» X • 30. Y « 39, X • 28, and X » 32.33. 
" n 21 22 23 2T 

Also, b, ■ .19514 and b « a63027 (their values are found directly from the 

1 2 
printout for the full node\). 

Y,(adj) = 30 - (.19514) (lo4 - IO6.33) - (.63027) Qo - 32.3^] « 31.92. 

Y^Cadj) » 33 - (.19514) [u4 - 106.333 - (•630^7) Ij9 - 3?. 33*] » 27.30. 

YjCadj) - 26 . (.19514) |j01 - 106.33]) - (.C3027) (28 - 32.33"] » 29.77. 

The process of adjusting the rnQans can be seen as a way to "control" to some 

extent the difference on the covariates. 

Forming a Summary Table 

Forming a suniwiry table for the analysis of covariancc when using a 

regression approach Is a relatively straight- forward process. The sum of 

squares within i% found directly from the printout from the full model and is 

2 

118,32. The adjusted sum of squares total is given by SS (adj) « SS (1 - R 

T J m 

where R Is the multiple correlation between Y and the covariates (the 
RM 

restricted model) which, in the present case, is R » .83961; also 
2 

R » .70495. With SS^ » 525.33, SS^fadj) - 525.33 (1 - . '0495} « 155.00. 

The adjusted sum of squares amonq SS (adj) can oe found as a residual and is 

A 

155.00 - 118.32 " 36.68. The summary table io given in Table 2. 

TABLE 2 

SUMMARY TABLE FOR THE ANALYSIS OF COVARIWICE WITH TWO COVARIATES 



Source of Variation 


df 


SS 


MS 


F 


Annng 


2 


36.68 


18. 34 


1.55 


Within 


10 


118.32 


11.83 




Total 


12 


155.00 
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It Should be clear from this presentation that any number of covarlates could 
be employed in an analysis of covariance. Potential researchers should be 
cautioned against using the ''slop bucket" approach to using a large number of 
covarlates simply because It is possible. In addition to being non-sc1entifiC| 
the use of each covariate does entail the loss of one degree of freedom in -jhe 
adjusted :um of squares within term. A person could use 25 covarlates with 
ease; he should be familiar encjgh with the ddta to make a reasonable Inter- 
pretation of that data after the adjustiTKjnt;, however. A program has been 
prepared (Williams and Lindcm, 1974a) to accommodate up to 20 covarlates 
(which can be redimensioned to include more covarlates if necessary); the 
program prints out summary tables for the analysis of variance for the 
criterion scores and an analysis of covariance with the multiple covarlates 
and the adjusted means. 

Two-Way Fixed Effects Analysis of Varianco wi th Disproportionate Cell 
Frequencies 

The solution to the disproportionate case of the two-way fixed effects 
analysis of variance is complicated by the existence of more than one 
solution^ the different solutions being dependent upon the assumptions of 
the researcher. The present program (Williams and Lindem, 1972) allows 
for the selection of ar^y (or all) of the foliating least squares solutions: 
(a) the method of fitting constants, a commonly accepted solution, described 
in Scheffe (1959) and Anderson and Bancroft (1552), a method that adjusts 
each main effect for the other main effect; (b) the hierarchical model 
(Conen» 1968) » which allows for one effect to take precedence over the 
second effect; the first main effect is unadjusted, and the second main 
effect Is adjusted for the first main effect; and (c) the unadjusted main 
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effects method^ In which nr^ithcr niain effect is adjusted for the other 
main effect. In all three methods, the interaction effect Is adjusted 
for the two ^.ain effects. The three least squares methods and th'j 
previously mentioned approximate solutions are compared by Williams 
(1972). 

As an example of the solutions to the disproportionate two-way 
situation, consider the following data in Table 3. 

0 TABLE 3 

DATA FOR DlSPROrOMlCNATE TV;04/AY W^ALYSIS OF VARIANCE 



IS 



Effect 



8 
6 
4 



tffcct 

2 3 



6 

2 



10 



10 
9 
7 

5 
4 



To solve for any of the three solutions, four linear models are necessary: 
Model I: Y = * b^X^ + e^, {5) 



Kodel II: Y • b„ ♦ b„X ♦ bX^ 4 e , (6) 
0 2 2 3 3 2 

Model 111: Y » b„ + b X, + b^X + b X + c , (7) 
0 1 1 2 2 3 3 3 

Model IV: Y»b +bX * h X * b X +bX +bX ♦c, (O) 
0 ll 22 33 44 55 4 



where 

y • the criterion I 

X ■ 1 If the score is from a inember of row 1» ond 0 othv^rwlse; 
1 

X ■ 1 If the score Is from a menier of column 1, and 0 otherwise; 
2 

X * 1 If the score 1s from a member of column 2, and 0 otherwise; 
3 

X « X . X ; 

4 1 2 
X • X • X ; 

5 1 3 

b - b ■ are regression coefficients (The values for fa^, t , b and 

0 5 0 1 Z 

b win. In general, be different for Models 1-IV), and 
3 

e - e • are the errors in prediction with their respective models, 

1 4 

Table 4 contains a formulation Tor the regression solutions to the two-way 
fixed effects analysis of variance with disproportionate cell frequcnciss, 

TiUJLE 4 

REGRESSIOIl FORMULATION FOR THC TWO-KAY WiALYSIS OF VARIA'JCE 



r 




X 
2 


X 

3 


X 

4 


X 

s 


8 


1 


1 


0 


1 


0 


6 


1 


1 


0 


1 


0 


4 


1 


1 


0 


1 


0 


1 


1 


0 


1 


0 


1 


1 


1 


0 


1 


0 


1 


6 


1 


0 


0 


0 


0 


2 


1 


0 


0 


0 


0 


10 


0 


1 


0 


0 


0 


7 


0 


0 


1 


0 


0 


5 


0 


0 


1 


0 


0 


4 


0 


0 


1 


0 


0 


4 


0 


0 


1 


0 


0 


3 


0 


0 


1 


0 


0 


10 


0 , 


0 


0 


0 


0 


9 


0 


0 


0 


0 


0 


7 


0 


0 


0 


0 


0 


S 


0 


0 


0 


0 


0 


4 


0 


0 


0 


0 


0 
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The values from the regs*essiofi program that are useful for completing 

the analysis of variance are; the sura of s<juarci attributable to regression 

for Models I, II, III, and IV, and the sum of squares for deviation from 

2 

regression for Model IV. The R values are also included in Table 5. The 

total sum of squares is of course available from all four models. 

The A effect for the method of fitting constants is the difference 

between the sum of squares for attributable to regression for Model III 

and Model 11: SS. » 80.25 - 37.43 « 42.02. 
A 

Essentially, this process amounts to finding that part of the A effect 
that is Independent of the B effect. 

The B effect for this method is the difference between the sum of 

squares for attributable to regression for Model III and Model I: 

SS, ■ P^„25 - 20.36 " 59.39. 
B 

TABLE 5 

VAIUES FOUND FPOH m REGRESSION ANALYSIS 



Model I (A effect) 
Attributable to Regression 

Model II (B effect) 
Attributable to Regression 

Model III (Combined A i B effects) 
Attributable to Regression 

Hty*i\ IV (Full Model) 
Attributable to Regression 

Deviation from Regression 

Total Sum of Squares 



df 



1 

2 

3 

5 
12 
17 



SS 



20.36 



80.80 
51.20 
132.00 



.16427 



37.43 .28355 
60.25 .60796 



.61212 



Similarly^ this second calculation ytc'ds that part of the 0 effect 
that is independent of the A effect. 

And finally, the interdction is found as the difference bttween Model 
IV ^nd Model HI: SS^ « 80,C0 - 80.25 = .5S. Thus, the effect fDund in 
this manner is the AB effect independent of the A and D effects. 

The sum of squares for within Is equal to the deviation from regression 
for Model IV. This information for the data in Table 4 can be put into a 
summary table (Tab^e 6). 

TABLE 6 

SmViK< TABLE FOR lliC f€THOD OF PITTING CONSTANTS 



Source of Variation 


df 


SS 


MS 


F 


A 


1 


42.82 


42. C2 


10.03** 


B 


2 


59.89 


29.95 


7.01** 


AB 


2 


.55 


.28 


.07 


Within 


12 


51.20 


4.27 





♦♦p < .01 



The method of fitting constants is not a partitioning model. That is» 
If tha sum of squares •> totaled, it does not equal the total sum of squares 
of 132.00 (The total Is 154.4G). 



The Hiorarch1c.ll Model 

The hierarchical uodel (Cohon, 1'?'38) is a i.Tcthod similar to the method 
of fitting constants. With this approach, a researcher is required to nrdcr 
the variables in relation to their research interest. For example, a 



researcher may be most interested in the A, or row effect, less interested 

in the B, or coluim effect, and may have little interest In the Interaction 

effect. With this approach, each effect is adjusted only for those effects 

preceding it in the ordering. Thus, the A effect is found directly, the B 

effect is adjG:;ted for tha A effect, and the AB effect is adjusted for 

the combined A and B nffect. Unlike the previous noJcl, this model is 

additive In the sense that the sum SS^ + SS + SS is equal to SS • The 

A B w T 
values for SS , SS , and SS can be found from Table 5: SS = 20.36, tiie 
A B w A 

unadjusted A effect: SS = 80. ?5 - 20.36 = 59.89, that part of the B 

6 

Indenondent of A; SS^_ - 80.89 - 80.25 = .55, as previously, and SS « 51.20. 

Ad W 

Tftese values are placed in a usual summary table (Table 7). 

TABLE 7 

SUMMARY TABLE FOR THE HIERARCHICAL MODEL 



Source of Variation 


df 


SS 


MS 


F 


A 


1 


20.36 


20.36 


4.77* 


6 


2 


59.89 


29.95 


7.01** 


AB 


2 


.55 


.28 


.07 


Within 


12 


51.20 


4.27 




Total 


17 


132.00 







*p < .05 
**p K .01 



The results from tliis analysis are identical to the fitting constants 
method except for the SS term. The interpretation would h«! somewhat 

A 

different however, because of the decrease in size of the SS^ term. If, on 

A 

the other hand» the researcher had chosen his order of experimental interest 



as B, A, AB, then the F values for the A effect and the AD effect would be 
unchanged from the fitting constants method, but the B effect would be 
smaller. 

Th e Unadjusted Main Effects f^ethod 

A solution similar to the two previous least squares solutions can be 

called the unadjusted main effects method. Using this approach, both the 

A and B effects are found directly, with the Interaction found in the same 

manner as the method of fitting constants and the hierarchical model. The 

error term (mean square within) is of course the same. The values for 

SS . SS , SS • and SS can be found from Table 5: SS » 20.36, the 
A 6 AS w A 

unadjusted A effect; SS = 37. A3, the unadjusted B effect; SS.^ « 80.80 - 

B AB 

80.25 = .55, as previously; and SS = 51.20. 

w 

Table 8 contains the unadjusted main effects method analysis. 

TABLE 8 

SUMMARY TABLE FOR THE UNADJUSTED MAIH EFFECTS METHOD 



Source of Variation 


df 


SS 


HS 


F 


A 


1 


20.36 


20.36 


4.77* 


B 


2 


37.43 


18.72 


4.88* 


AB 


2 


.55 


.28 


.07 


Within 


12 


51.20 


4.27 





*p < .05 



If the sum of squares is totaled for '''able 8, the total is less 
than 132.00 because of the suppressor relationship between A and B 
(the total for Table 8 is actually 109.54). The unadjusted main effects 



■tethod is identical, as a solution, to the one proposed by Jennings 
(1967) • That Jennings' approach and th:» unadjusted main effects method 
yield the same results was shown by Halldorson (1969). 

Two-Way Analysis of Covariance with Multiple Covariates and Proportionate or 

Disproportionate Cell Frpquenjies 

The present program (Williams and Lindem, 1974b) is a generalized two- 
way fixed effects analysis of covariance program that will allow multiple 
covariates and/or disproportionality of the cell frequencies. Becaust the 
program is general, it can be used whether or not there are multiple 
covariates and whether or not disproportionality of the cell frequencies 
exists. As was true of the program documented for the two-way fixed effects 
analysis of variance with disproportionate cell frequencies, three distinct 
Solutions exist for this analysis of covariance situation: (1) the method 
of fitting constants, a solution that adjusts each main effect for the 
covariates and the other main effect; (2) the hierarchical model, which 
alla/s one main effect to take precedence over the second main effect; the 
first main effect is adjusted only for the covariates, and the second main 
effect adjusted for both the first main effect and the covariates,, and 
(3) the unadjusted main effects method, in which the main effects are 
adjusted only for the covariates. In all three solutions, the interaction 
effect is adjusted for the covariates and the two main effects. These 
three solutions are analogous to the previously documented solutions for 
the fixed effects analysis of variance with disproportionate cell 
frequencies. 

As an illustrative example, suppose the data is cast in a 2 X 3 
table with two covariates. Then the following models could be generated: 



Model V: Y « + b X ♦ b X ♦ e , (9) 

0 6 6 7 7 5 
Model VI: Y=^b ♦bX + bX ♦bX + 0, (10) 

0 1 1 6 6 7 7 6 
Model VII: Y = b ♦ b^X + b X ♦ b X + b X ♦ e • (11) 

0 2 2 3 3 6 6 7 7 7 
Model VIII; Y = b ♦bX + bX ♦bX ♦bX ♦bX + 6, (12) 

0 11 22 33 66 77 8 
Model IX: Y = b ♦bX ♦bX ♦bX ♦bX ♦bX ♦bX + bX + 6. (13) 
0 11 22 33 44 55 66 77 9 

where 

Yf X . X , X , X , X and b - b are defined as previously given 

1 2 3 4 5 0 5 
In the solution for disproportionate cell frequencies for a two-way 

analysis of variance, 

Xg « the score on the first covariate for each subject, 

« the score on the second covariate for each subject. 

b - b^ = are regression coefficients for X and X respectively, (b - b 
6 7 6 7 0 7 

Hi VI, in general, be different for Models V-IX), and 

e - e ° the errors In prediction for Models V-IX. 
5 9 

Then, for the fitting constants solution, 

SS^ ^ the SS for attributable to regression for f-todel VIII 
A 

the SS for attrlbutolilc to regression for Model VII, (14) 

the SS for attributable to regression for Model VIII - 

the SS for attributable to regression for Model VI, (15) 

the SS for attributable to regression for Model IX - 

the SS for attrlbulaole to regression for Model VIII, (16) 



and 

SS c the SS for deviation from regression for Model IX. (17) 
w 

For the hierarchical solution v/lth prifr.ary interest In the A effect; 




SS «= the SS for attributable to regression for Kodel VI - 
A 

tY^ SS for attributable to regression for ?todel V, il8) 

SSg « same as equation 15» 

SS " same as equation 16, and 
AB 

SS » same as equation 17. 
w 

For the unadjusted main effects solution: 

SS. « same as equation 18p 
A 

SS^ « the SS for attributable to regression for Model VII - 
B 

the SS for attributable to regression for Model V» (19) 

SS^^ " &ame as equation 16» and 
AB 

SS « same as equation 17. 
w 

The fitting constants solution for the analyses of covarlance can be seen 
as analogous to the fitting constants solution for the two-way analysis of 
vaHancGi except that the covariates arre alS£> rem*^ved as a source of variation; 
thust the A effect in the fitting constants solution is that portion 
Independent of both the B effect and the covariates. In the hierarchical 
solution^ the effect of primary research Interest is adjusted for the 
covariates only; in the unadjusted main effects solution* the nain effects 
are adjusted for the covariates only» and not adjusted for the other main 
effect. The interaction effect and within term are the same for all three 
solutions. 

The solutions for COVARHLT (the analysis of covarlance with multiple 
covariates) and the two-way analysis of covarlance described here do not 
Include a test for the homogeneity of the regression on the covariates. 
Future revisions of these two programs will include options for running 
these tests if the user so desires. 
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